Learning from Demonstrations for Real World Reinforcement Learning

نویسندگان

  • Todd Hester
  • Matej Vecerik
  • Olivier Pietquin
  • Marc Lanctot
  • Tom Schaul
  • Bilal Piot
  • Andrew Sendonaris
  • Gabriel Dulac-Arnold
  • Ian Osband
  • John Agapiou
  • Joel Z. Leibo
  • Audrunas Gruslys
چکیده

Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages this data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism. DQfD works by combining temporal difference updates with supervised classification of the demonstrator’s actions. We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it takes PDD DQN 82 million steps to catch up to DQfD’s performance. DQfD learns to out-perform the best demonstration given in 14 of 42 games. In addition, DQfD leverages human demonstrations to achieve state-ofthe-art results for 17 games. Finally, we show that DQfD performs better than three related algorithms for incorporating demonstration data into DQN.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning from Imperfect Demonstrations

Robust real-world learning should benefit from both demonstrations and interaction with the environment. Current approaches to learning from demonstration and reward perform supervised learning on expert demonstration data and use reinforcement learning to further improve performance based on reward from the environment. These tasks have divergent losses which are difficult to jointly optimize;...

متن کامل

Learning Deep Policies for Physics-Based Manipulation in Clutter

Uncertainty in modeling real world physics makestransferring traditional open-loop motion planning techniquesfrom simulation to the real world particularly challenging.Available closed-loop policy learning approaches, for physics-based manipulation tasks, typically either focus on single objectmanipulation, or rely on imitation learning, which inherentlyconstrains task g...

متن کامل

Learning from Limited Demonstrations in High Dimensional Feature Spaces

Reinforcement learning (RL) has recently gained a lot of popularity partially due to the success of deep Q-learning (DQN) on the Atari suite and AlphaGo. In these online domains DQN-RL performs favorably thanks to its ability to continuously learn at super human speeds. Unfortunately, in many real world applications, such as in robotics, the learning rate is limited due to the speed at which th...

متن کامل

Inverse Reinforcement Learning from Failure

Inverse reinforcement learning (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, failed demonstrations are also readily available. In addition, in some tasks, purposely generating failed demonstrations may be easier than generating successful ones. Since existing IRL me...

متن کامل

Deep Reinforcement Learning for Robotic Manipulation

Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1704.03732  شماره 

صفحات  -

تاریخ انتشار 2017